policy scope
Online Reinforcement Learning for Mixed Policy Scopes
Combination therapy refers to the use of multiple treatments -- such as surgery, medication, and behavioral therapy - to cure a single disease, and has become a cornerstone for treating various conditions including cancer, HIV, and depression. All possible combinations of treatments lead to a collection of treatment regimens (i.e., policies) with mixed scopes, or what physicians could observe and which actions they should take depending on the context. In this paper, we investigate the online reinforcement learning setting for optimizing the policy space with mixed scopes. In particular, we develop novel online algorithms that achieve sublinear regret compared to an optimal agent deployed in the environment. The regret bound has a dependency on the maximal cardinality of the induced state-action space associated with mixed scopes. We further introduce a canonical representation for an arbitrary subset of interventional distributions given a causal diagram, which leads to a non-trivial, minimal representation of the model parameters.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > California > San Mateo County > San Mateo (0.04)
- (2 more...)
- North America > United States > California > Los Angeles County > Los Angeles (0.28)
- North America > United States > Virginia > Arlington County > Arlington (0.04)
- North America > United States > New York (0.04)
- (3 more...)
- Research Report > Strength High (0.46)
- Research Report > Experimental Study (0.46)
Online Reinforcement Learning for Mixed Policy Scopes
Combination therapy refers to the use of multiple treatments -- such as surgery, medication, and behavioral therapy - to cure a single disease, and has become a cornerstone for treating various conditions including cancer, HIV, and depression. All possible combinations of treatments lead to a collection of treatment regimens (i.e., policies) with mixed scopes, or what physicians could observe and which actions they should take depending on the context. In this paper, we investigate the online reinforcement learning setting for optimizing the policy space with mixed scopes. In particular, we develop novel online algorithms that achieve sublinear regret compared to an optimal agent deployed in the environment. The regret bound has a dependency on the maximal cardinality of the induced state-action space associated with mixed scopes.
Contextual Causal Bayesian Optimisation
Arsenyan, Vahan, Grosnit, Antoine, Bou-Ammar, Haitham
Causal Bayesian optimisation (CaBO) combines causality with Bayesian optimisation (BO) and shows that there are situations where the optimal reward is not achievable if causal knowledge is ignored. While CaBO exploits causal relations to determine the set of controllable variables to intervene on, it does not exploit purely observational variables and marginalises them. We show that, in general, utilising a subset of observational variables as a context to choose the values of interventional variables leads to lower cumulative regrets. We propose a general framework of contextual causal Bayesian optimisation that efficiently searches through combinations of controlled and contextual variables, known as policy scopes, and identifies the one yielding the optimum. We highlight the difficulties arising from the application of the causal acquisition function currently used in CaBO to select the policy scope in contextual settings and propose a multi-armed bandits based selection mechanism. We analytically show that well-established methods, such as contextual BO (CoBO) or CaBO, are not able to achieve the optimum in some cases, and empirically show that the proposed method achieves sub-linear regret in various environments and under different configurations.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
- Asia > Armenia > Yerevan > Yerevan (0.04)
- Africa > South Africa (0.04)